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NEURAL  NETWORKS  FOR  SEQUENTIAL  DISCRIMINATION 

OF  RADAR  TARGETS 


J.  A.  Haimerl  and  E.  Geraniotis 


Abstract 

In  this  paper,  perceptron  neural  networks  are  applied  to  the  problem  of  discrimi¬ 
nating  between  two  classes  of  radar  returns.  The  perceptron  neural  networks  are  used 
as  nonlinearities  in  two  threshold  sequential  discriminators  which  act  upon  samples  of 
the  radar  return.  The  test  statistic  compared  to  the  thresholds  is  of  the  form  Tn( Z)  = 
+1  7  {Zj  ,  Zj+ 1 , . . .  ,  Zj+K- 1)  where  Z{,  i  =  1, 2, 3, . . .  are  the  radar  samples  and  q() 
is  the  nonlinearity  formed  by  the  neural  network.  Numerical  results  are  presented  and 
compared  to  existing  discrimination  schemes. 
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NEURAL  NETWORKS  FOR  SEQUENTIAL  DISCRIMINATION 

OF  RADAR  TARGETS 
I.  Introduction  and  Motivation 

A  problem  often  encountered  in  military  radar  applications  is  the  binary  classification 
problem.  That  is,  after  detecting  an  object,  the  radar  must  classify  it  as  a  member  of  one 
of  two  classes:  target  or  decoy.  We  shall  denote  these  two  classes  (or  hypotheses)  as  H\ 
and  Hq  respectively. 

The  discriminator  (i.e  classification  or  automatic  target  recognition  circuitry)  obtains 
a  sequence  of  data,  Zi,  Z2, . . . ,  Zn ,  denoted  by  the  vector  Z,  from  the  radar  returns  of  the 
detected  object.  This  sequence  may  be  samples  of  the  radar  return  envelope  signal,  the 
in-phase  and  quadrature  signals,  the  phase  signal,  or  any  other  information  carrying  signal. 
In  any  case,  the  data  sequence  is  a  random  process  and  is  described  by  the  probability 
density  functions  (pdfs)  fi(z\ ,  22,  •  •  • ,  zn)  where  i  =  0, 1  represents  hypothesis  Hi.  More 
specifically,  we  consider 


Hi  :  {Z}”=1  has  pdf  fi(zu  z2, . . . ,  zn)  =  /j(z) 
H0  :  {Z}"=1  has  pdf  /0(zi,  z2, . . . ,  zn)  =  /0(z) 


(1) 


where  z  represents  the  n-tuple  (zq, Z2, . . . ,  zn).  It  is  assumed  that  the  data  sequence  is 
stationary,  and  that  the  data  sequence  may  be  correlated  and  non-Gaussian. 

For  the  discriminator  to  work  properly,  it  must  be  designed  so  that  the  probability  of  it 
making  an  error  is  small.  Quick  decisions  are  also  desirable,  so  n  should  be  relatively  small. 
If  the  probability  density  functions  of  the  data  were  known,  likelihood  ratio  functions  could 
be  used  for  the  test.  A  likelihood  ratio  test  has  the  form 


d(  Z)  = 


1, 

0, 


if 

if 


/i(Z) 
/o(  Z) 
/i(Z) 
/o(Z) 


>  T, 

<  V- 


(2) 


where  77  is  a  constant  to  be  determined.  Hypothesis  Hi  is  chosen  by  the  discriminator 
when  d( Z)  =  i,  i  —  0, 1.  Likelihood  ratio  tests  are  well  known  and  optimal  in  the  Bayes, 


Neyman-Pearson,  and  minimax  senses[l];  the  choice  of  77  depends  upon  which  criterion  the 

designer  chooses  to  optimize. 
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Typically,  the  71-dimensional  probability  density  functions  are  not  known  or  result  in 
a  structure  too  complex  to  implement.  If  the  lower  order  probability  density  functions  (he. 
with  order  <  n)  are  known  or  can  be  estimated,  discriminators  with  a  test  statistic  of  the 
form 

n  —  K+ 1 

Tn( Z)  =  g(Zj,Zj+i,...,Zj+K-i)  (3) 

j- 1 


can  be  developed.  The  function  g(x\,  X2, . . . ,  xk)  is  a  suitably  chosen  nonlinearity.  The 
test  statistic  can  be  used  in  both  block  and  sequential  tests.  The  fact  that  K  <  n  implies 
that  a  discriminator  will  have  a  simpler  structure  since  it  need  only  have  a  memory  of 
K  samples  as  opposed  to  n.  The  nonlinearity  g(x\,x 2,  ■  •  ■  ixk),  a  function  of  only  K 
variables,  will  be  simpler  in  general  than  a  function  of  n  variables  when  I\  <  n.  However, 
since  I\  <  n,  the  test  in  general  will  be  suboptimal  to  the  n-dimensional  likelihood  ratio 
test. 

A  block  test  could  be  implemented  as 

T  if  Tn(Z)  >  77;  ,  .x 

0,  if  Tn(Z)  <  77. 


where  77  is  a  decision  threshold. 

A  sequential  test  could  proceed  as  follows:  compare  the  test  statistic  Tn  to  two  thresh¬ 
olds  a  and  b.  If  Tn  >  b  declare  H\  and  terminate  the  test.  If  Tn  <  a  declare  H0  and 
terminate  the  test.  If  a  <  Tn  <  b  then  obtain  the  next  sample  Zn+ 1,  compute  Tn+1,  and 
repeat  the  test.  Terminate  the  sequential  test  with  a  block  test  if  a  decision  has  not  been 
made  by  the  maximum  number  of  data  samples  N. 

Memoryless  tests  (he.  I\  —  1)  have  been  studied  by  [2]  and  [3]  for  block  and  sequential 
tests  respectively.  [2]  and  [3]  employ  central  limit  theorems  for  dependent  observations 
(especially  for  stationary  mixing  processes)  to  derive  linear  integral  equations  which  they 
solve  for  the  optimal  nonlinearity  g(x).  More  recent  work  has  been  done  for  the  one-step- 
memory  case  where  K  —  2  (see  [4]).  [4]  showed  that  probability  density  functions  up  to 
degree  4  were  needed  to  solve  for  the  one-step-memory  nonlinearity. 
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Estimation  of  probability  density  functions  above  degree  2  becomes  impractical  due  to 
computer  processing  limitations,  and  knowledge  of  higher  order  pdfs  is  not  likely.  There¬ 
fore,  a  different  approach  to  designing  a  discriminator  is  presented  in  this  paper.  Using 
apriori  data  sequences  from  each  hypothesis  (he.  field  data),  a  perceptron  neural  net¬ 
work  is  trained  with  a  supervised  learning  algorithm  to  produce  desirable  outputs.  This 
perceptron  neural  network  is  then  implemented  in  a  structure  to  act  as  the  nonlinearity 
g(x i,X2, . . .  ,xk)  in  equation  (3).  First,  we  shall  review  some  basics  of  perceptron  neural 
networks. 

II.  Perceptron  Neural  Network  Basics 

For  a  thorough  introduction  to  perceptron  neural  networks,  refer  to  [5].  Perceptron 
neural  networks  are  interconnected  layers  of  simple  processing  units  called  perceptrons.  A 
perceptron  is  specified  by  its  weighting  vector  w  =  (w0,w i, . . .  ,wk-i)T ,  its  offset  value  9 , 
and  its  nonlinearity.  Usually  the  nonlinearity  is  a  sigmoid 


f{y)  = 


1 

1  +  e-y ' 


(5) 


Note  that  we  use  the  terms  perceptron  and  node  interchangeably.  We  also  refer  to  the 
offset  value  9  as  the  node  offset  value. 

The  perceptron  operates  by  taking  the  input  vector 
x  =  ( xq  ,  x\ , . . . ,  xk- i  )T  and  forming  the  dot  product 


K-l 

Y  XiWi  =  xwr.  (6) 

i=0 


From  the  dot  product,  an  offset  value  9  is  subtracted  to  get  the  result  y  =  xw1  —  9]  y  is 
then  passed  through  the  nonlinearity,  and  /(y)  is  output.  The  nonlinearity  /(y)  limits  the 
outputs  to  values  between  0  and  1.  Very  large  values  of  y  will  result  in  values  of  /(y)  near 
1.  As  y  approaches  — oo,  /(y)  approaches  0. 

If  the  sigmoid  were  replaced  by  a  hard  limiter, 


(  1,  if  x  >  0; 

0,  otherwise, 


(7) 
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the  neural  network  would  essentially  separate  the  input  space,  RA ,  by  a  hyperplane.  Input 
vectors  lying  on  one  side  of  the  hyperplane  would  be  output  as  a  0,  while  input  vectors  on 
the  other  side  of  the  hyperplane  would  be  output  as  a  1.  The  hyperplane  is  determined  by 
the  weights  and  node  offset  value  of  the  perceptron.  If  the  hard  limiter  were  replaced  by 
the  sigmoid,  the  decision  region  would  become  soft. 

More  complex  decision  regions  can  be  formed  by  utilizing  multiple  hyperplanes.  De¬ 
cision  regions  can  be  formed  by  using  a  perceptron  to  form  each  hyperplane  of  a  complex 
region.  The  output  of  each  perceptron  can  then  be  fed  into  an  AND  gate  —  or,  better 
yet,  another  perceptron  with  weights  and  an  offset  appropriately  set  to  simulate  an  AND 
function.  This  leads  to  the  concept  of  multi-layer  perceptron  neural  networks. 

Multiple-layer  perceptron  neural  networks  take  the  outputs  of  the  perceptrons  on 
a  layer  and  use  them  as  inputs  to  the  next  higher  level  of  perceptrons  (see  Figure  1). 
Networks  of  this  type  are  usually  called  feed-forward  neural  networks.  As  demonstrated 
in  the  above  discussion,  a  single  perceptron  can  only  divide  the  decision  space  with  a 
hyperplane.  But  it  has  been  shown  that  a  two-layer  perceptron  neural  network  can  form 
any  convex  decision  region  [5].  A  convex  region  is  a  region  from  which  any  two  points  can 
be  connected  by  a  line  which  lies  entirely  within  the  region.  A  third  layer  of  nodes  can 
allow  the  network  to  form  any  arbitrary  decision  region  [5]  (assuming  enough  nodes  are 
allocated  to  the  correct  layers). 

To  form  a  desired  decision  region,  the  weights  and  node  offset  values  for  each  node 
in  each  layer  of  a  neural  network  must  be  specified.  This  would  be  a  difficult  task  even 
if  the  decision  region  were  known.  But,  for  many  problems,  the  decision  region  is  not 
known  because  the  statistical  models  of  the  data  are  not  known.  Training  algorithms  to 
form  appropriate  decision  regions  exist  for  perceptron  neural  networks.  These  algorithms 
typically  present  the  training  data  to  the  network  along  with  a  desired  response  and  the 
network  weight  values  and  node  offset  values  are  adjusted  to  force  the  actual  network 
response  towards  the  desired  response.  One  such  algorithm  is  the  back-propagation  al¬ 
gorithm.  The  back-propagation  algorithm  is  a  gradient  search  method  (searching  over  w 
and  0),  which  minimizes  the  square  error  of  the  neural  network  outputs  [6].  Note  that  the 
back-propagation  algorithm  requires  the  nodes  to  have  sigmoidal  nonlinearities. 
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III.  The  Discriminator 

We  start  by  defining  the  structure  of  our  sequential  discriminator.  Our  discriminator 
utilizes  a  test  statistic  of  the  form 

n  —  K+l 

Tn(  Z)=  7(ZJ-,Zj+1,...,ZJ+if_1).  (8) 

3= 1 

A  two  threshold  test  is  implemented,  using  the  constants  a  and  b.  So,  upon  obtaining  a 
new  data  sample,  Zn,  the  discriminator  computes  the  test  statistic  Tn.  If  Tn  reaches  b, 
then  the  discriminator  chooses  Hi  and  terminates  the  test.  If  Tn  drops  to  a,  then  the  test 
terminates  and  the  discriminator  chooses  Hq.  If  Tn  lies  between  a  and  b ,  then  another 
sample  Zn+i  is  obtained,  Tn+\  is  computed,  and  the  entire  test  is  repeated.  This  process 
continues  until  either  a  decision  is  made,  or  the  iV-th  sample  is  reached.  Upon  obtaining 
the  iV-th  sample,  Tyv  is  computed  and  a  one-threshold  test  is  performed.  Obviously  Tk- i 
is  initialized  to  a  value  in  the  interval  (a,  b). 

We  now  restrict  the  class  of  nonlinearities  of  the  form 
7(*ii*2,.  •  •  > %k)  to  have  a  range  with  maximum  absolute  value  of  r.  That  is,  we  require 

\l(x\ )  x2,  ■  ■  ■ ,  %K )|  <  ^  for  all  possible  values  of 

the  K  —  tuple  (xi ,  X2, . . . ,  xk).  (9) 

This  restriction  leads  to  a  suboptimal  discriminator  in  general,  but  allows  us  to  obtain  a 
solution. 

Now  assuming  that  r,  5,  and  b  are  all  specified  constants,  the  structure  of  our  test 
allows  us  to  scale  r,  a,  and  b  to  get  a  test  with  a  nonlinearity  with  a  maximum  absolute 
value  of  1.  The  newly  scaled  thresholds  shall  be  denoted  as  a  and  b.  This  rescaling  of  r  to 
1  allows  us  to  utilize  a  perceptron  neural  network  with  a  sigmoid  nonlinearity  on  its  nodes 
in  the  following  paragraphs. 

To  find  the  optimal  nonlinearity  within  our  class,  we  first  consider  the  optimal  paths 
that  the  test  statistic  Tn  can  take  under  each  hypothesis.  By  optimal  path  we  mean 
the  path  that  Tn  should  take  to  minimize  the  number  of  samples  needed  to  cross  the 
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correct  threshold  under  the  appropriate  hypothesis.  Obviously  the  quickest  path  to  reach 
a  threshold  is  when  the  discriminator  takes  a  step  of  magnitude  1  in  the  appropriate 
direction  upon  obtaining  each  new  data  sample.  That  is,  for  each  new  data  sample,  the  test 
statistic  under  Hi  is  incremented  by  +1,  while  the  test  statistic  under  Hq  is  incremented 
by  —1.  If  the  data  sequence  {Z}fl1  is  obtained  by  sampling  some  continuous  process  with 
a  uniform  sampling  period  T,  then  the  optimal  path  for  Tn  would  lie  on  a  straight  line 
with  slope  +  ^  for  Hi  and  slope  — ^  for  Hq.  Figure  2  depicts  these  paths.  Thus,  for  an 
ideal  discriminator,  (that  is  a  discriminator  which  never  makes  mistakes  and  always  uses 
the  minimum  number  of  samples  possible),  the  statistics  of  the  nonlinearity  should  be 


Ei [l(Zi ,  Z2, . . . ,  Zk)\  =  1 
Eo[j(Zi,  Z2 , . . . ,  Zk)\  —  -1 
Vari['j(Zi,Z2,...,ZK)\  =  0 

Varo['r(Zi,Z2,...,ZK)\  =  0  (10) 

where  Ei  and  Vari  denote  expectation  and  variance  respectively  under  hypothesis  H{. 

We  cannot  expect  a  real  discriminator  to  achieve  the  statistics  of  the  above  equations. 
However,  we  can  choose  the  nonlinearity  to  minimize  some  performance  measure,  such 
as  a  mean  squared  error  criterion  of  7  about  its  desired  values.  We  show  that  the  back- 
propagation  algorithm  can  be  used  to  minimize  a  related  mean  squared  error  criterion. 

We  form  a  nonlinearity  by  constructing  a  perceptron  neural  network  with  K  inputs 
Xi,  x2, . . .  ,xk  and  two  outputs  which  are  functions  of  the  inputs  (and  the  weights /offsets 
for  each  perceptron  in  the  network),  o1(xi  ,x2,--.,  xk)  and  o°(x i,X2, . . . ,  xk)-  To  simplify 
the  notation. we  denote  the  output  nodes  as  o1  and  o°.  During  training,  the  desired  values 
of  the  output  nodes  are  (10)  for  inputs  from  Hq  and  (0 1)  for  inputs  from  Hi .  Our  notation 
( x  y )  implies  that  o°  =  x  and  o1  =  y.  The  nonlinearity,  j(xi,x2,  . . .  ,%k),  is  formed  by 


j(x1,x2,. .  .,XK)  =  o1(xi,x2,. .  .,XK)  -o°(xi,x2,..  -,XK), 


or  with  simplified  notation, 


7  =  o1  —  o°. 


(11) 
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We  wish  the  nonlinearity  to  be  such  that  7  is  close  to  values  of  1  for  inputs  from  H\ 
and  -1  for  inputs  from  Hq.  We  choose  a  performance  measure  which  involves  the  mean 
squared  error  of  o1  and  o°  about  their  desired  values  for  each  hypothesis: 

S  =  E0  [(1  -  o0)2  +  (0  -  o1)2]  +  E x  [(0  -  o0)2  +  (1  -  o1)2]  .  (12) 

We  would  like  the  weight  and  node  offset  values  of  each  perceptron  in  our  neural  network 
to  have  values  which  minimize  equation  (12). 

Recall  that  the  back-propagation  algorithm  [6]  is  a  gradient  descent  algorithm  which 
minimizes  the  performance  measure 

e  =  5  E  E«JP  -  °if  (u) 

v  j 

where  tJp  is  the  desired  output  for  node  j  associated  with  input  pattern  p,  and  o]p  is  the 
actual  value  of  the  output  node  j  associated  with  input  pattern  p.  Suppose  we  have  P 
/^-tuples  from  each  hypothesis  available  for  training  the  neural  network.  We  also  have 
j  =  0,1  for  the  two  output  nodes  o1  and  o°,  respectively.  We  can  rewrite  (13)  as 


E 


P—1  2P—\ 

\  £ «i  -  «°)2 + (0  -  °')2}  + 1  E  «°  -  “°)2 + (i  -  »’)2) 


(14) 


p= 0 


where  the  first  sum  is  over  the  Hq  training  patterns  and  the  second  sum  is  over  the  Hi 
training  patterns.  The  problem  of  minimizing  E  is  equivalent  to  minimizing  E  scaled  by 
a  constant.  Thus  minimizing  (14)  is  equivalent  to  minimizing 


1  P- 1  1  2P-1 

p  E«i  -  °0)2 + (o  -  o1)2)  +  j  E  «°  -  o0)2 + d 

p=0  p=P 


o1)2}. 


(15) 


Now  as  P 


00  we  have 


jE  Eq  [(1  -  o0)2  +  (0  -  o1)2]  +  Ei  [(0  -  o0)2  +  (1  -  o1)2]  =  5,  (16) 
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which  is  our  desired  performance  measure.  Consequently,  the  back-propagation  algorithm 
is  a  reasonable  algorithm  to  be  utilized  for  our  perceptron  neural  network  nonlinearity. 
Thus,  using  this  nonlinearity  we  can  form  the  test  statistic  of  equation  (8). 

Figure  3  shows  the  implementation  of  our  test.  The  incoming  data  samples  are  passed 
through  a  tapped  delay  line.  The  IT  taps  are  the  inputs  to  the  perceptron  neural  network. 
The  difference  of  the  outputs  of  the  neural  network  is  formed  and  added  to  the  test  statistic 
Tj.  The  notation  subscripts  j  correspond  to  the  values  associated  with  the  jth  data  sample. 
The  sample  number  j  is  compared  to  N.  If  j  reaches  N,  then  a  one  threshold  test  is 
performed  (in  this  figure  the  threshold  is  0.)  If  j  is  less  than  N,  then  a  two  threshold  test 
is  performed. 

IV.  Training 

The  neural  network  used  in  our  sequential  discrimination  scheme  operates  on  IT-tuples 
( Zk'-i+j,  Zl<-2+j,  •  •  • ,  Zj),  which  are  formed 

from  the  incoming  data  sequence  {Zj}°Z1  on  which  the  discriminator  must  make  a  decision 
of  Hi  or  Ho.  The  neural  network  may  have  two  or  three  layers  of  nodes,  but  it  will  always 
have  two  output  nodes  on  the  output  layer. 

The  neural  network  is  trained  using  the  back-propagation  algorithm  and  the  training 
data  set.  The  training  data  set  consists  of  M  sample  paths  (i.e  sequences)  of  length  N 
from  each  hypothesis.  These  training  data  are  defined  as  where  i  =  0, 1  denotes 

the  hypothesis  (H\  or  Ho),  m  =  0,1,..., M  -  1  denotes  the  sample  path  number,  and 
j  —  0,1,..., N  —  1  denotes  the  sample  number.  The  desired  responses  for  the  neural 
network  are  (1  0)  for  Ho  and  (0  1)  for  Hi.  Our  notation  (a  b)  implies  that  the  output 
node  0  outputs  a  and  the  output  node  1  outputs  b. 

The  training  process  proceeds  as  follows:  The  first  I\  -tuple  from  the  first  sample  path 
from  H0,  (Co.OjCo.u  •  •  •  »Co,  K_i),  is  presented  to  the  neural  network  inputs.  The  back- 
propagation  algorithm  is  performed  using  (1  0)  as  the  desired  output.  Then  the  first 
IT-tuple  from  the  first  sample  path  from  Hi,  (Co,o> Co,n  •  •  •  j  Co,a'-i)>  is  presented  to  the 
neural  network  inputs.  Back-propagation  is  performed  with  the  desired  response  of  (0  1). 
Then  the  second  IT-tuple  from  the  first  Ho  sample  path,  (Co,n  Co,2>  ■  ■  ■  i  Co ,k)i  1S  Presented 
to  the  network  for  back-propagation.  Then  the  second  IT-tuple  from  the  first  H\  sample 
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path,  (Cq1,!,  Co1, 2,  -  -  - ,  Co  k)i  Presented  to  the  network  for  back-propagation.  When  all  the 
IC-tuples  (of  ordered  adjacent  samples)  from  the  first  sample  path  for  H o  and  Hi  have  been 
exhausted,  the  process  is  repeated  for  the  remaining  until  they  have  all  been  exhausted. 
Then  the  entire  process  is  repeated  until  all  sample  paths  have  been  presented  to  the 
network  L  times. 

V.  Results 

The  sequential  neural  network  discriminators  of  the  previous  sections  are  evaluated. 
First,  the  training  data  is  obtained  by  a  computer  simulation.  After  training,  the  discrim¬ 
inator  is  evaluated,  again  by  simulated  data.  For  simplicity,  we  take  only  one  envelope 
sample  per  radar  return  pulse.  Furthermore,  we  assume  that  the  radar  returns  are  cor¬ 
related  from  pulse  to  pulse,  and  that  the  correlation  structure  modeled  by  a  first  order 
Markov  process.  The  marginal  pdfs  of  the  simulated  samples  may  be  either  lognormal  or 
Rayleigh. 

The  Rayleigh  processes  are  generated  by  underlying  Gaussian  processes  (be.  the  in- 
phase  and  quadrature  components.)  We  denote  the  envelope  observations  as  {Ztj <^zl.  The 
Rayleigh  envelope  process  is  generated  by 

Zi  =  Jxf  +  Y?,  i  =  1,2,3,...  (17) 

where  and  {Yi}“i  are  mutually  independent  Gaussian  stationary  first  order 

Markov  processes.  This  implies  that  is  also  stationary  and  first  order  Markov. 

The  underlying  Gaussians  are  generated  by 

Xi  =  +  y/l-pPVi 

Yi  =  pYi_  1  +  y/l  -  p2Wi,  for  i  —  2,3, ...  •  (18) 

with 

Xi  =  aV I 

Yi  =  aWx  (19) 

where  {Rj}^!  and  { 1R } ,  are  mutually  independent  sequences  of  i.i.d.  (independent 
and  identically  distributed)  zero  mean/unit  variance  Gaussian  random  variables,  a  is  the 
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standard  deviation  of  the  underlying  Gaussians,  while  p  is  the  correlation  coefficient  for 
adjacent  samples. 

The  correlation  coefficient  p  is  related  to  the  decorrelation  time  r  in  the  following 
manner:  r  is  defined  to  be  the  time  it  takes  for  the  correlation  coefficient  between  the  first 
sample  and  another  sample  to  decrease  by  a  factor  of  e~1 . 

Our  lognormal  process  is  simulated  by  exponentiating  an  underlying  Gaussian  process: 

Zi  =  exp(Xi  +  p),  i  =  1,2,3,...  (20) 

where  Xi  is  generated  in  the  same  manner  as  equations  (18)  and  (19).  Unlike  the  Rayleigh 
processes  which  have  underlying  Gaussians  with  zero  mean,  the  underlying  Gaussians  for 
the  lognormal  process  may  have  a  mean  p. 

Table  1  lists  the  discrimination  cases  tested  with  the  sequential  neural  network  dis¬ 
criminator.  For  Case  1,  the  target’s  samples  are  from  a  lognormal  marginal  density,  while 
the  decoy’s  samples  are  Rayleigh.  The  means  of  both  hypotheses  are  matched;  the  pow¬ 
ers  of  both  hypotheses  are  also  matched.  The  decorrelation  times  are  t\  —  0.130290  and 
ro  =  0.013029,  indicating  that  the  decoy’s  samples  become  uncorrelated  ten  times  as  fast 
as  the  target’s  samples  do. 

For  Case  2,  both  hypotheses  are  Rayleigh.  However,  a  3dB  power  difference  exists  in 
favor  of  Hi.  The  decorrelation  times  are  identical  to  Case  1.  Both  hypotheses  are  again 
Rayleigh  for  Case  3.  However  in  Case  3,  the  marginal  pdfs  are  identical  for  both  pdfs 
(matched  means  and  powers).  The  decorrelation  times  remain  unchanged  from  Case  1  and 
Case  2. 

Table  2  details  the  neural  networks  trained  for  each  case.  Net  1  was  trained  for  Case 
1.  Net  2  was  trained  for  Case  2,  and  Net  3  was  trained  for  Case  3.  All  networks  had  two 
layers.  Each  network  also  had  K  =  4  inputs,  and  No  =  16  nodes  on  its  first  layer  of  nodes. 
The  nets  were  trained  with  training  data  (generated  by  computer  simulation)  as  described 
above.  For  each  case,  the  training  data  consisted  of  M  —  50  sample  paths  from  each 
hypothesis.  Each  path  consisted  of  N  =  1000  samples.  The  number  of  times  each  sample 
path  was  presented  to  the  network  during  training  was  L  =  100.  The  gain  constant  in  the 
back-propagation  algorithm  (see  [6])  was  set  to  0.001.  The  column  labeled  S  in  Table  2 
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represents  the  performance  measure  defined  by  equation  (12).  After  the  training  process 
was  completed,  S  was  estimated  by  computing  the  average  squared  difference  between  the 
desired  net  output  and  the  actual  output  when  the  training  data  was  again  presented  to 
the  net.  Recall,  this  measure  is  a  mean  squared  error  of  the  actual  neural  network  output 
about  the  desired  response. 

After  training  the  neural  networks,  they  were  inserted  into  the  discriminator  structure 
of  Figure  3.  The  thresholds  a  and  b  were  chosen  by  experimentation;  a  was  set  to  —20  and 
b  was  set  to  20.  The  initial  value  of  the  test  statistic  was  set  to  0.  Computer  simulated  data 
were  then  fed  to  the  discriminator  simulations.  10,000  simulated  sample  paths  from  each 
hypothesis  for  each  case  were  presented  to  the  discriminator  for  classification.  Table  3  gives 
the  results  for  each  case.  Table  3  also  gives  the  results  of  [3]  for  Case  1  and  2.  Optimal 
memoryless  nonlinearities  in  [3]  were  derived  from  known  probability  density  functions. 
One  could  expect  optimal  memoryless  nonlinearities  derived  from  estimated  pdfs  ( e.g . 
estimated  from  the  training  data)  to  perform  worse.  [3]  did  not  include  results  for  Case 
3;  this  was  due  to  a  limitation  imposed  by  their  performance  measure,  which  became  zero 
when  the  marginal  pdfs  under  both  hypotheses  were  identical.  The  columns  F'i  and  Po 
in  Table  3  represent  the  measured  probabilities  of  error  under  Hi  and  Ho  respectively. 
The  column  labeled  E[n]  contains  the  average  number  of  samples  required  for  the  test  to 
terminate. 

VI.  Conclusions 

A  scheme  for  utilizing  perceptron  neural  networks  in  a  discrimination  scheme  utilizing 
the  test  statistic  of  equation  (3)  was  presented.  This  test  statistic  can  be  readily  utilized  in 
either  a  block  or  sequential  tests.  The  neural  network’s  training  phase  eliminates  the  im¬ 
practical  ta.sk  of  estimating  high-order  pdfs  when  designing  a  discriminator;  consequently 
discriminators  with  memory  (he.  K  >  1)  are  easily  obtained. 

Some  results  were  presented  for  a  sequential  implementation.  The  discriminators  using 
neural  networks  for  their  nonlinearities  significantly  out-performed  the  optimal  memory¬ 
less  discriminators  of  [3].  The  discriminators  constructed  with  neural  networks  made  no 
classification,  errors  in  10,000  trials  from  each  hypothesis!  These  discriminators  also  used 
a  significantly  smaller  expected  number  of  samples  to  make  their  decisions  than  did  the 
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discriminators  of  [3]  (refer  to  Table  3)!  Furthermore,  the  neural  networks  were  able  to 
converge  to  a  nonlinearity  for  Case  3;  [3]  could  not  even  consider  Case  3  due  to  the  fact 
that  the  marginal  pdfs  under  both  hypotheses  were  identical. 
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OdB 
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Rayleigh 
vs  Rayleigh 

OdB 

■  — 

0.130290, 

0.013029 
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Figure  1.  A  Multiple  Layer  Perceptron  Neural  Network 


Optimal  Path  for  H, 


-A  Non-Optimal  Path  for  Hq 


Table  1:  Description  of  the  Discrimination  Test  Cases 


Case 

pdfs 

Power  Ratio 

Mean  Ratio 

Decorrelation 

Times 

Hi  vs  H0 

Hi  vs  H0 

H i  vs  H0 

n  ,t0 

Case  1 

Lognormal 
vs  Rayleigh 

OdB 

OdB 

0.130290, 

0.013029 

Case  2 

Rayleigh 
vs  Rayleigh 

3dB 

— 

0.130290, 

0.013029 

Case  3 

Rayleigh 
vs  Rayleigh 

OdB 

— 

0.130290, 

0.013029 

Table  2:  Neural  Networks  for  Each  Case 


Case 

Network 

I< 

No 

S 

Case  1 

Net  1 

4 

16 

3.259  x  1CT6 

Case  2 

Net  2 

4 

16 

4.075  x  10~6 

Case  3 

Net  3 

4 

16 

4.012  x  IQ”6 

1 


Table  3:  Results  for  10,000  Sample  Paths  from  Each  Hypotheses 


Case  Discriminator  Pi  P0  E[n] 

Case  1  Net  1  0  0  32 

Case  1  Memoryless  .010  0  572 

Case  2  Net  2  0  0  55 

Case  2  Memoryless  .003  .002  2025 

Case  3  Net  3  0  0  41 


